Factors influencing Life Expectancy using Linear Regression
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# To enable plotting graphs in Jupyter notebook
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
cdata = pd.read_csv('/home/jayanthikishore/Downloads/ML_classwork/Week3strt/Life Expectancy Data.csv')
cdata.head()
| Country | Year | Status | Life expectancy | Adult Mortality | Infant deaths | Alcohol | Percentage expenditure | Hepatitis B | Measles | ... | Polio | Total expenditure | Diphtheria | HIV/AIDS | GDP | Population | Thinness 1-19 years | Thinness 5-9 years | Income composition of resources | Schooling | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | 2015 | Developing | 65.0 | 263.0 | 62 | 0.01 | 71.279624 | 65.0 | 1154 | ... | 6.0 | 8.16 | 65.0 | 0.1 | 584.259210 | 33736494.0 | 17.2 | 17.3 | 0.479 | 10.1 |
| 1 | Afghanistan | 2014 | Developing | 59.9 | 271.0 | 64 | 0.01 | 73.523582 | 62.0 | 492 | ... | 58.0 | 8.18 | 62.0 | 0.1 | 612.696514 | 327582.0 | 17.5 | 17.5 | 0.476 | 10.0 |
| 2 | Afghanistan | 2013 | Developing | 59.9 | 268.0 | 66 | 0.01 | 73.219243 | 64.0 | 430 | ... | 62.0 | 8.13 | 64.0 | 0.1 | 631.744976 | 31731688.0 | 17.7 | 17.7 | 0.470 | 9.9 |
| 3 | Afghanistan | 2012 | Developing | 59.5 | 272.0 | 69 | 0.01 | 78.184215 | 67.0 | 2787 | ... | 67.0 | 8.52 | 67.0 | 0.1 | 669.959000 | 3696958.0 | 17.9 | 18.0 | 0.463 | 9.8 |
| 4 | Afghanistan | 2011 | Developing | 59.2 | 275.0 | 71 | 0.01 | 7.097109 | 68.0 | 3013 | ... | 68.0 | 7.87 | 68.0 | 0.1 | 63.537231 | 2978599.0 | 18.2 | 18.2 | 0.454 | 9.5 |
5 rows × 22 columns
cdata.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2938 entries, 0 to 2937 Data columns (total 22 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Country 2938 non-null object 1 Year 2938 non-null int64 2 Status 2938 non-null object 3 Life expectancy 2928 non-null float64 4 Adult Mortality 2928 non-null float64 5 Infant deaths 2938 non-null int64 6 Alcohol 2744 non-null float64 7 Percentage expenditure 2938 non-null float64 8 Hepatitis B 2385 non-null float64 9 Measles 2938 non-null int64 10 BMI 2904 non-null float64 11 Under-five deaths 2938 non-null int64 12 Polio 2919 non-null float64 13 Total expenditure 2712 non-null float64 14 Diphtheria 2919 non-null float64 15 HIV/AIDS 2938 non-null float64 16 GDP 2490 non-null float64 17 Population 2286 non-null float64 18 Thinness 1-19 years 2904 non-null float64 19 Thinness 5-9 years 2904 non-null float64 20 Income composition of resources 2771 non-null float64 21 Schooling 2775 non-null float64 dtypes: float64(16), int64(4), object(2) memory usage: 505.1+ KB
cdata.shape
(2938, 22)
cdata.columns
Index(['Country', 'Year', 'Status', 'Life expectancy', 'Adult Mortality',
'Infant deaths', 'Alcohol', 'Percentage expenditure', 'Hepatitis B',
'Measles', 'BMI', 'Under-five deaths', 'Polio', 'Total expenditure',
'Diphtheria', 'HIV/AIDS', 'GDP', 'Population', 'Thinness 1-19 years',
'Thinness 5-9 years', 'Income composition of resources', 'Schooling'],
dtype='object')
# remove the rows of data which have missing value(s)
cdata = cdata.dropna()
# Check the unique values in each column of the dataframe.
cdata.nunique()
Country 133 Year 16 Status 2 Life expectancy 320 Adult Mortality 369 Infant deaths 165 Alcohol 833 Percentage expenditure 1645 Hepatitis B 83 Measles 603 BMI 538 Under-five deaths 199 Polio 68 Total expenditure 669 Diphtheria 66 HIV/AIDS 167 GDP 1649 Population 1647 Thinness 1-19 years 179 Thinness 5-9 years 185 Income composition of resources 548 Schooling 147 dtype: int64
plt.figure(figsize=(10,7))
plt.scatter(cdata['Schooling'], cdata['Life expectancy'], color='#c3a935')
plt.title('Life expectancy Vs Schooling', fontsize=14)
plt.xlabel('Schooling', fontsize=14)
plt.ylabel('Life expectancy', fontsize=14)
plt.grid(True)
plt.show()
plt.figure(figsize=(10,7))
plt.scatter(cdata['Measles'], cdata['Life expectancy'], color='#359dc3')
plt.title('Life expectancy Vs Measles ', fontsize=14)
plt.xlabel('Measles', fontsize=14)
plt.ylabel('Life expectancy', fontsize=14)
plt.grid(True)
plt.show()
plt.figure(figsize=(10,7))
plt.scatter(cdata['Adult Mortality'], cdata['Life expectancy'], color='#a035c3')
plt.title('Life expectancy Vs Adult Mortality', fontsize=14)
plt.xlabel('Adult Mortality', fontsize=14)
plt.ylabel('Life expectancy', fontsize=14)
plt.grid(True)
plt.show()
plt.figure(figsize=(10,7))
plt.scatter(cdata['Life expectancy'][:200], cdata['Country'][:200], color='#c36b35')
plt.title('Country Vs Life expectancy', fontsize=14)
plt.xlabel('Life expectancy', fontsize=14)
plt.ylabel('Country', fontsize=14)
plt.grid(True)
plt.show()
sns.pairplot(cdata, height=3, diag_kind='auto', corner=True)
plt.show()
plt.figure(figsize=(10,10))
sns.boxplot(cdata['Life expectancy'], orient='v')
plt.show()
plt.figure(figsize=(8,8))
sns.boxplot(x="Status",y="Life expectancy",data=cdata)
plt.show()
cdata[cdata.columns[:]].corr()['Life expectancy'][:]
Year 0.050771 Life expectancy 1.000000 Adult Mortality -0.702523 Infant deaths -0.169074 Alcohol 0.402718 Percentage expenditure 0.409631 Hepatitis B 0.199935 Measles -0.068881 BMI 0.542042 Under-five deaths -0.192265 Polio 0.327294 Total expenditure 0.174718 Diphtheria 0.341331 HIV/AIDS -0.592236 GDP 0.441322 Population -0.022305 Thinness 1-19 years -0.457838 Thinness 5-9 years -0.457508 Income composition of resources 0.721083 Schooling 0.727630 Name: Life expectancy, dtype: float64
plt.figure(figsize=(20,20))
sns.heatmap(cdata.corr(), annot=True, fmt=".2")
plt.show()
X = cdata.drop('Life expectancy', axis=1)
y = cdata[['Life expectancy']]
print(X.head())
print(y.head())
Country Year Status Adult Mortality Infant deaths Alcohol \ 0 Afghanistan 2015 Developing 263.0 62 0.01 1 Afghanistan 2014 Developing 271.0 64 0.01 2 Afghanistan 2013 Developing 268.0 66 0.01 3 Afghanistan 2012 Developing 272.0 69 0.01 4 Afghanistan 2011 Developing 275.0 71 0.01 Percentage expenditure Hepatitis B Measles BMI ... Polio \ 0 71.279624 65.0 1154 19.1 ... 6.0 1 73.523582 62.0 492 18.6 ... 58.0 2 73.219243 64.0 430 18.1 ... 62.0 3 78.184215 67.0 2787 17.6 ... 67.0 4 7.097109 68.0 3013 17.2 ... 68.0 Total expenditure Diphtheria HIV/AIDS GDP Population \ 0 8.16 65.0 0.1 584.259210 33736494.0 1 8.18 62.0 0.1 612.696514 327582.0 2 8.13 64.0 0.1 631.744976 31731688.0 3 8.52 67.0 0.1 669.959000 3696958.0 4 7.87 68.0 0.1 63.537231 2978599.0 Thinness 1-19 years Thinness 5-9 years Income composition of resources \ 0 17.2 17.3 0.479 1 17.5 17.5 0.476 2 17.7 17.7 0.470 3 17.9 18.0 0.463 4 18.2 18.2 0.454 Schooling 0 10.1 1 10.0 2 9.9 3 9.8 4 9.5 [5 rows x 21 columns] Life expectancy 0 65.0 1 59.9 2 59.9 3 59.5 4 59.2
print(X.shape,y.shape )
(1649, 21) (1649, 1)
X = pd.get_dummies(X, columns=['Country', 'Status'])
X.head()
| Year | Adult Mortality | Infant deaths | Alcohol | Percentage expenditure | Hepatitis B | Measles | BMI | Under-five deaths | Polio | ... | Country_Turkmenistan | Country_Uganda | Country_Ukraine | Country_Uruguay | Country_Uzbekistan | Country_Vanuatu | Country_Zambia | Country_Zimbabwe | Status_Developed | Status_Developing | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015 | 263.0 | 62 | 0.01 | 71.279624 | 65.0 | 1154 | 19.1 | 83 | 6.0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 1 | 2014 | 271.0 | 64 | 0.01 | 73.523582 | 62.0 | 492 | 18.6 | 86 | 58.0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2 | 2013 | 268.0 | 66 | 0.01 | 73.219243 | 64.0 | 430 | 18.1 | 89 | 62.0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 3 | 2012 | 272.0 | 69 | 0.01 | 78.184215 | 67.0 | 2787 | 17.6 | 93 | 67.0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4 | 2011 | 275.0 | 71 | 0.01 | 7.097109 | 68.0 | 3013 | 17.2 | 97 | 68.0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
5 rows × 154 columns
#split the data into train and test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
from sklearn.linear_model import LinearRegression
linearregression = LinearRegression()
linearregression.fit(X_train, y_train)
print("Intercept of the linear equation:", linearregression.intercept_)
print("\nCOefficients of the equation are:", linearregression.coef_)
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
pred = linearregression.predict(X_test)
Intercept of the linear equation: [-370.39923249] COefficients of the equation are: [[ 2.18341200e-01 -4.25965091e-04 7.28152544e-02 -1.15746797e-01 -3.63099062e-05 1.99567177e-03 -4.41704129e-06 -5.08329484e-03 -5.66462864e-02 -8.57849110e-04 -1.73216730e-02 2.20806233e-03 -2.78374203e-01 -4.82283672e-07 1.96856187e-09 1.03678455e-02 5.67478068e-02 5.30963077e-01 4.05983164e-01 -8.78321506e+00 7.46457656e+00 4.92555587e+00 -1.42658382e+01 6.01520191e+00 5.88048191e+00 1.94391020e+00 3.65381015e+00 2.32870321e+00 1.41649354e+00 2.22867048e+00 1.59959045e+00 1.57736884e+00 -8.60376699e+00 -2.24478634e+00 7.87205969e+00 -8.02143731e+00 4.38559558e+00 -4.76806311e+00 -7.54778984e+00 -9.85626475e+00 4.38903311e+00 -2.03678264e+00 -1.00404780e+01 1.34596802e+01 -1.29481469e+01 -1.23459633e+01 1.06404340e+01 5.31590679e+00 5.66500448e+00 -5.58832507e+00 1.08326967e+01 -1.71393790e+00 1.51385537e+00 -3.68586682e+00 4.92769106e+00 6.64362218e+00 3.66895438e+00 6.21724894e-15 -5.00185384e+00 5.82498645e+00 -1.99677849e+00 -1.40945544e-01 1.40774573e+01 -2.91411032e+00 6.03624515e+00 2.37740546e+00 -4.06255033e+00 1.29686571e+01 7.13966581e+00 -8.81099198e+00 -8.98352872e+00 -1.09935918e+00 -4.69095455e+00 5.94636052e+00 -1.19806439e+00 -1.05711230e+00 4.15411611e-01 2.39181278e+00 1.19669073e+01 3.75763791e+00 6.65903405e+00 4.32301087e+00 -1.87671319e+00 -7.36094991e+00 -2.30040589e+00 -4.97686150e+00 5.62765944e+00 -1.25610871e+01 -6.50097116e+00 -5.42286487e+00 3.29944724e+00 -3.46521938e+00 -1.22923569e+01 4.84409505e+00 6.92397336e+00 -9.19859022e+00 2.84022035e+00 -3.24066207e+00 3.71690721e+00 7.72158487e+00 -2.22980311e+00 5.84847666e+00 4.53910570e+00 -9.69720382e+00 -2.98729348e+00 -2.77460078e+00 -1.58338035e+00 9.42439431e-01 5.47594821e+00 -2.33864142e+00 -4.41127302e+00 -3.25284580e+00 8.75573924e+00 -4.08709466e+00 4.98511381e+00 5.42573096e+00 -2.41166938e-01 -2.79062495e+00 1.35647266e+00 -5.49115792e+00 -2.31232015e-02 -5.22956619e+00 5.68579389e+00 -1.75611274e+00 -2.13942249e+00 6.11010903e+00 4.39779152e+00 -1.88261721e+01 1.28478274e+00 -5.42805953e+00 2.24505808e+00 3.37555362e+00 3.24079796e+00 -6.54386423e+00 2.27728958e+00 6.27853088e+00 -9.79822596e-01 4.73178236e+00 -2.60229669e+00 -9.78893048e+00 3.76723698e+00 3.54054284e+00 5.28635744e+00 4.71282558e+00 -2.56002810e+00 -7.17678331e+00 2.20702234e+00 7.24732074e+00 -1.13919257e-01 3.85522462e+00 -7.67892834e+00 -1.04447138e+01 5.03543939e+00 -5.03543939e+00]]
# Mean Absolute Error
mean_absolute_error(y_test, pred)
1.224371976020462
mean_squared_error(y_test, pred)**0.5
2.07571589731342
# R2 Squared:
r2_score(y_test, pred)
0.9454598210791567
# Training Score
linearregression.score(X_train, y_train)
0.9723226823336295
#Testing score
linearregression.score(X_test, y_test)
0.9454598210791567
predict = pd.DataFrame({'Actual': y_test.values.flatten(), 'Predicted': pred.flatten()})
predict
| Actual | Predicted | |
|---|---|---|
| 0 | 67.5 | 67.680534 |
| 1 | 73.8 | 73.785661 |
| 2 | 79.1 | 80.160082 |
| 3 | 54.9 | 53.138555 |
| 4 | 48.6 | 51.046567 |
| ... | ... | ... |
| 490 | 64.8 | 65.959530 |
| 491 | 71.4 | 72.459268 |
| 492 | 77.2 | 77.069198 |
| 493 | 78.6 | 77.934373 |
| 494 | 68.0 | 68.095903 |
495 rows × 2 columns
We can also visualize comparison result as a bar graph using the below script :
Note: As the number of records is huge, for representation purposes we'll consider just 25 records.
df1 = predict.head(25)
df1.plot(kind='bar',figsize=(16,10))
plt.grid(which='major', linestyle='-', linewidth='0.5', color='green')
plt.grid(which='minor', linestyle=':', linewidth='0.5', color='black')
plt.show()
The Training and testing scores are around 95% and both scores are comparable, hence the model is a good fit.
R2_score is 0.945, that explains 94.5% % of total variation in the dataset. So, overall the model performance is very satisfactory.
import statsmodels.api as sm
X = sm.add_constant(X)
linearmodel = sm.OLS(y, X).fit()
predictions = linearmodel.predict(X)
print_model = linearmodel.summary()
print(print_model)
OLS Regression Results
==============================================================================
Dep. Variable: Life expectancy R-squared: 0.967
Model: OLS Adj. R-squared: 0.964
Method: Least Squares F-statistic: 294.9
Date: Tue, 20 Apr 2021 Prob (F-statistic): 0.00
Time: 15:07:59 Log-Likelihood: -3100.3
No. Observations: 1649 AIC: 6505.
Df Residuals: 1497 BIC: 7327.
Df Model: 151
Covariance Type: nonrobust
====================================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------------------------
const -248.7325 22.406 -11.101 0.000 -292.683 -204.782
Year 0.2219 0.017 12.837 0.000 0.188 0.256
Adult Mortality -0.0006 0.001 -1.194 0.233 -0.002 0.000
Infant deaths 0.0497 0.016 3.204 0.001 0.019 0.080
Alcohol -0.0652 0.030 -2.151 0.032 -0.125 -0.006
Percentage expenditure -7.639e-05 0.000 -0.625 0.532 -0.000 0.000
Hepatitis B 0.0032 0.002 1.315 0.189 -0.002 0.008
Measles -6.555e-06 6.46e-06 -1.014 0.311 -1.92e-05 6.12e-06
BMI -0.0015 0.003 -0.432 0.666 -0.008 0.005
Under-five deaths -0.0372 0.011 -3.402 0.001 -0.059 -0.016
Polio -0.0006 0.003 -0.231 0.817 -0.006 0.004
Total expenditure -0.0221 0.026 -0.837 0.403 -0.074 0.030
Diphtheria 0.0009 0.003 0.300 0.764 -0.005 0.007
HIV/AIDS -0.3029 0.016 -19.157 0.000 -0.334 -0.272
GDP 1.42e-05 1.85e-05 0.770 0.442 -2.2e-05 5.04e-05
Population -2.969e-10 9.29e-10 -0.320 0.749 -2.12e-09 1.52e-09
Thinness 1-19 years 0.0115 0.033 0.351 0.726 -0.053 0.076
Thinness 5-9 years 0.0664 0.031 2.124 0.034 0.005 0.128
Income composition of resources 0.9846 0.594 1.656 0.098 -0.181 2.151
Schooling 0.2849 0.078 3.653 0.000 0.132 0.438
Country_Afghanistan -10.1219 0.638 -15.870 0.000 -11.373 -8.871
Country_Albania 6.4265 0.454 14.145 0.000 5.535 7.318
Country_Algeria 4.1914 0.525 7.980 0.000 3.161 5.222
Country_Angola -16.5790 0.699 -23.728 0.000 -17.950 -15.208
Country_Argentina 5.3187 0.627 8.478 0.000 4.088 6.549
Country_Armenia 4.7513 0.454 10.456 0.000 3.860 5.643
Country_Australia -4.3413 0.634 -6.847 0.000 -5.585 -3.098
Country_Austria -3.3310 0.728 -4.575 0.000 -4.759 -1.903
Country_Azerbaijan 2.2798 0.489 4.660 0.000 1.320 3.239
Country_Bangladesh 0.8201 0.687 1.194 0.233 -0.528 2.168
Country_Belarus 1.0368 0.588 1.763 0.078 -0.117 2.190
Country_Belgium -4.4922 0.669 -6.711 0.000 -5.805 -3.179
Country_Belize 0.5589 0.456 1.226 0.221 -0.336 1.453
Country_Benin -9.8262 0.552 -17.794 0.000 -10.909 -8.743
Country_Bhutan -2.9737 0.662 -4.490 0.000 -4.273 -1.675
Country_Bosnia and Herzegovina 7.0048 0.538 13.011 0.000 5.949 8.061
Country_Botswana -8.0765 0.519 -15.569 0.000 -9.094 -7.059
Country_Brazil 3.7303 0.537 6.944 0.000 2.676 4.784
Country_Bulgaria -11.5049 0.820 -14.039 0.000 -13.112 -9.897
Country_Burkina Faso -9.4831 0.777 -12.202 0.000 -11.008 -7.959
Country_Burundi -11.1037 0.632 -17.569 0.000 -12.343 -9.864
Country_Cabo Verde 3.4397 0.482 7.137 0.000 2.494 4.385
Country_Cambodia -3.0317 0.612 -4.953 0.000 -4.232 -1.831
Country_Cameroon -11.3621 0.606 -18.755 0.000 -12.550 -10.174
Country_Canada 12.5932 0.678 18.577 0.000 11.263 13.923
Country_Central African Republic -15.0064 0.873 -17.183 0.000 -16.720 -13.293
Country_Chad -14.0998 0.839 -16.796 0.000 -15.746 -12.453
Country_Chile 9.9676 0.647 15.418 0.000 8.699 11.236
Country_China 4.3679 1.070 4.083 0.000 2.269 6.466
Country_Colombia 4.6012 0.452 10.172 0.000 3.714 5.489
Country_Comoros -6.6889 0.546 -12.240 0.000 -7.761 -5.617
Country_Costa Rica 9.7895 0.476 20.551 0.000 8.855 10.724
Country_Croatia -8.4505 0.902 -9.371 0.000 -10.219 -6.682
Country_Cyprus -5.0392 0.803 -6.277 0.000 -6.614 -3.464
Country_Djibouti -4.0879 0.816 -5.012 0.000 -5.688 -2.488
Country_Dominican Republic 3.9597 0.458 8.654 0.000 3.062 4.857
Country_Ecuador 5.8435 0.462 12.651 0.000 4.937 6.750
Country_El Salvador 2.9605 0.456 6.492 0.000 2.066 3.855
Country_Equatorial Guinea -10.5791 1.709 -6.190 0.000 -13.931 -7.227
Country_Eritrea -5.2720 0.803 -6.569 0.000 -6.846 -3.698
Country_Estonia 4.8213 0.650 7.413 0.000 3.546 6.097
Country_Ethiopia -3.8958 0.777 -5.013 0.000 -5.420 -2.372
Country_Fiji -0.9322 0.484 -1.926 0.054 -1.881 0.017
Country_France 13.0956 0.659 19.862 0.000 11.802 14.389
Country_Gabon -4.1053 0.568 -7.223 0.000 -5.220 -2.990
Country_Georgia 4.8277 0.459 10.527 0.000 3.928 5.727
Country_Germany -3.8999 0.693 -5.624 0.000 -5.260 -2.540
Country_Ghana -6.3122 0.517 -12.198 0.000 -7.327 -5.297
Country_Greece 11.8024 0.606 19.462 0.000 10.613 12.992
Country_Guatemala 4.6049 0.568 8.108 0.000 3.491 5.719
Country_Guinea -10.1550 0.713 -14.235 0.000 -11.554 -8.756
Country_Guinea-Bissau -9.9541 0.768 -12.964 0.000 -11.460 -8.448
Country_Guyana -2.3327 0.483 -4.831 0.000 -3.280 -1.385
Country_Haiti -5.9173 1.221 -4.845 0.000 -8.313 -3.522
Country_Honduras 4.9065 0.461 10.644 0.000 4.002 5.811
Country_India -4.2958 2.936 -1.463 0.144 -10.054 1.462
Country_Indonesia -1.7606 0.567 -3.105 0.002 -2.873 -0.648
Country_Iraq 1.3724 0.544 2.522 0.012 0.305 2.440
Country_Ireland -3.3899 0.884 -3.836 0.000 -5.123 -1.656
Country_Israel 11.2500 0.585 19.227 0.000 10.102 12.398
Country_Italy -2.9275 0.675 -4.340 0.000 -4.251 -1.604
Country_Jamaica 5.7366 0.504 11.388 0.000 4.748 6.725
Country_Jordan 3.5860 0.474 7.561 0.000 2.656 4.516
Country_Kazakhstan -2.7043 0.492 -5.493 0.000 -3.670 -1.739
Country_Kenya -7.9393 0.534 -14.876 0.000 -8.986 -6.892
Country_Kiribati -3.0633 0.515 -5.950 0.000 -4.073 -2.053
Country_Latvia -11.3075 0.728 -15.539 0.000 -12.735 -9.880
Country_Lebanon 4.9847 0.486 10.264 0.000 4.032 5.937
Country_Lesotho -13.2314 0.637 -20.784 0.000 -14.480 -11.983
Country_Liberia -7.4733 0.704 -10.610 0.000 -8.855 -6.092
Country_Lithuania -12.2178 0.722 -16.923 0.000 -13.634 -10.802
Country_Luxembourg -3.6289 0.795 -4.563 0.000 -5.189 -2.069
Country_Madagascar -4.8572 0.534 -9.104 0.000 -5.904 -3.811
Country_Malawi -12.8605 0.587 -21.923 0.000 -14.011 -11.710
Country_Malaysia 4.0456 0.471 8.581 0.000 3.121 4.970
Country_Maldives 5.6889 0.526 10.819 0.000 4.657 6.720
Country_Mali -10.9538 0.694 -15.786 0.000 -12.315 -9.593
Country_Malta -4.0138 0.803 -5.001 0.000 -5.588 -2.439
Country_Mauritania -4.8228 0.676 -7.130 0.000 -6.150 -3.496
Country_Mauritius 3.0149 0.466 6.476 0.000 2.102 3.928
Country_Mexico 6.8081 0.467 14.567 0.000 5.891 7.725
Country_Mongolia -3.1167 0.454 -6.867 0.000 -4.007 -2.226
Country_Montenegro 5.1178 0.612 8.364 0.000 3.918 6.318
Country_Morocco 3.3772 0.470 7.186 0.000 2.455 4.299
Country_Mozambique -10.1473 0.606 -16.758 0.000 -11.335 -8.960
Country_Myanmar -4.2218 0.632 -6.677 0.000 -5.462 -2.981
Country_Namibia -3.6736 0.764 -4.807 0.000 -5.173 -2.174
Country_Nepal -2.5562 0.628 -4.070 0.000 -3.788 -1.324
Country_Netherlands -6.0102 1.086 -5.536 0.000 -8.140 -3.881
Country_Nicaragua 5.1988 0.457 11.368 0.000 4.302 6.096
Country_Niger -4.9206 0.967 -5.087 0.000 -6.818 -3.023
Country_Nigeria -9.0740 1.618 -5.608 0.000 -12.248 -5.900
Country_Pakistan -4.8386 1.149 -4.212 0.000 -7.092 -2.585
Country_Panama 7.7180 0.487 15.847 0.000 6.763 8.673
Country_Papua New Guinea -5.3721 0.545 -9.858 0.000 -6.441 -4.303
Country_Paraguay 4.0266 0.503 7.998 0.000 3.039 5.014
Country_Peru 4.7112 0.513 9.185 0.000 3.705 5.717
Country_Philippines -1.3310 0.516 -2.581 0.010 -2.343 -0.319
Country_Poland -9.3585 0.738 -12.678 0.000 -10.806 -7.910
Country_Portugal -4.8215 0.716 -6.730 0.000 -6.227 -3.416
Country_Romania -10.5768 0.811 -13.037 0.000 -12.168 -8.985
Country_Russian Federation -1.3800 0.550 -2.508 0.012 -2.459 -0.300
Country_Rwanda -6.2381 0.539 -11.564 0.000 -7.296 -5.180
Country_Samoa 5.0599 0.483 10.482 0.000 4.113 6.007
Country_Sao Tome and Principe -2.7811 0.521 -5.342 0.000 -3.802 -1.760
Country_Senegal -3.9767 0.647 -6.142 0.000 -5.247 -2.707
Country_Serbia 4.9743 0.566 8.782 0.000 3.863 6.085
Country_Seychelles 3.2105 0.462 6.950 0.000 2.304 4.117
Country_Sierra Leone -19.5935 0.690 -28.390 0.000 -20.947 -18.240
Country_Solomon Islands 0.1565 0.540 0.290 0.772 -0.903 1.216
Country_South Africa -5.6935 0.560 -10.171 0.000 -6.792 -4.595
Country_Spain -2.9956 0.676 -4.430 0.000 -4.322 -1.669
Country_Sri Lanka 2.9115 0.591 4.923 0.000 1.751 4.071
Country_Suriname 2.0244 0.545 3.715 0.000 0.956 3.093
Country_Swaziland -6.7701 0.703 -9.626 0.000 -8.150 -5.390
Country_Sweden -4.2674 1.043 -4.090 0.000 -6.314 -2.221
Country_Syrian Arab Republic 5.3480 0.606 8.822 0.000 4.159 6.537
Country_Tajikistan -1.8310 0.488 -3.749 0.000 -2.789 -0.873
Country_Thailand 3.8398 0.469 8.181 0.000 2.919 4.760
Country_Timor-Leste -3.3591 0.683 -4.916 0.000 -4.699 -2.019
Country_Togo -10.5289 0.667 -15.792 0.000 -11.837 -9.221
Country_Tonga 3.3079 0.514 6.438 0.000 2.300 4.316
Country_Trinidad and Tobago 2.4275 0.492 4.937 0.000 1.463 3.392
Country_Tunisia 4.5314 0.482 9.395 0.000 3.585 5.478
Country_Turkey 4.5522 0.450 10.106 0.000 3.669 5.436
Country_Turkmenistan -3.5551 0.561 -6.339 0.000 -4.655 -2.455
Country_Uganda -8.5782 0.563 -15.250 0.000 -9.682 -7.475
Country_Ukraine 1.0439 0.515 2.029 0.043 0.035 2.053
Country_Uruguay 6.6655 0.541 12.319 0.000 5.604 7.727
Country_Uzbekistan -0.9602 0.476 -2.016 0.044 -1.895 -0.026
Country_Vanuatu 3.4526 0.505 6.833 0.000 2.461 4.444
Country_Zambia -8.8363 0.574 -15.392 0.000 -9.962 -7.710
Country_Zimbabwe -10.8083 0.614 -17.617 0.000 -12.012 -9.605
Status_Developed -116.5743 10.827 -10.767 0.000 -137.812 -95.336
Status_Developing -132.1581 11.582 -11.410 0.000 -154.877 -109.439
==============================================================================
Omnibus: 667.315 Durbin-Watson: 1.239
Prob(Omnibus): 0.000 Jarque-Bera (JB): 3360.711
Skew: 1.861 Prob(JB): 0.00
Kurtosis: 8.921 Cond. No. 1.42e+24
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 4.22e-30. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
You can read about other terms in the summary here.
# Plot between residual(actual - predicted) and predicted values
plt.figure(figsize=(10,8))
plt.scatter(linearmodel.predict(), linearmodel.resid, marker='*')
plt.show()
# error distribution
plt.figure(figsize=(10,8))
sns.distplot(linearmodel.resid, hist=True, kde=False, color='red')
plt.show()